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ABSTRACT 


Throughout history people have searched for a means of predicting the outcomes 
of battles. Data analysis is a way of understanding the factors associated with battle 
outcomes. There are objective factors, such as force ratio, and subjective factors, such as 
leadership, that affect battles. Subjective factors are hard to determine and thus are 
usually avoided in models. Here, nationality is investigated as a surrogate for subjective 
factors. That is, we want to see how nationality is associated with battle outcomes by 
exploring the best available data set on historical land combat—developed by the Center 
for Army Analysis. We focus on four countries for which there is sufficient data: the 
USA, Germany, Britain and Israel. We find that these countries historically use a 
substantial amount of military power to defeat their enemies. In particular, the USA 
often has overwhelming force. Using classification tree models, with a correct 
classification rate of 79 percent, the results suggest that nationality was the most 
important factor in battles before World War I and the second most important factor 
during the World Wars. Force ratio was the most important factor in WWI and artillery 
ratio in WWII. In the years following WWII, the dominant variable has been air force 


ratio. 
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EXECUTIVE SUMMARY 


Throughout history, predicting the outcome of a battle before it starts has been a 
main concern of soldiers, historians and analysts. Different tools have been used to make 
predictions. Two of the most important and commonly used are simulation and data 


analysis. 


Built on mathematical models, such as Lanchester equations, simulations, 
especially with the advancements in computer technology, are becoming increasingly 
important. Recent developments in computer technologies and new algorithms have 
made simulations very capable and reliable, but there still are pitfalls. For example, it is 
difficult to model intangibles such as leadership and training, and these factors can be just 


as important as a soldier’s weapon. 


Another tool is data analysis. It is widely used and has been producing quite 
satisfactory results. Moreover, unlike simulations, it is possible to use intangibles in data 
analysis models. In this work, we use data analysis. Our interest area is nationality 
factors. In other words, do different nations have different characteristics that affect the 
outcome of a battle? If there are nationality factors, what are they? Do they change over 


time? Can we use them to predict the outcome of a potential battle? 


In our analyses, we used the CDBG90 data set, developed for the Center for Army 
Analysis (CAA). This is the best data set available on historical land combat. This data 
set was first prepared by the Historical Evaluation and Research Organization (HERO) in 
1983, and we are using the version with the latest updates. The CDBG90 includes 657 
battles from 1600 to the end of the 20" century. There are up to 152 attributes listed for 
each battle. 


Numerous people have worked with this data set, including some NPS Masters’ 
students. These researchers looked at different aspects of warfare and tried to answer 
different questions. The first analysis was done by CAA, under the Combat History 
Analysis Effort (CHASE) beginning in 1984. Afterwards, Dupuy [Ref. 2] tried to model 


XV 


warfare without using advanced analysis techniques and formed the Quantified Judgment 
Model. Hartley built his Oak Ridge Spreadsheet Battle Model, which allows the user to 
predict the outcome of a potential conflict using an Excel spreadsheet [Ref. 1]. Yigit 
looked at the famous rule of thumb that an attacker with greater than a 3:1 Force Ratio 
wins, and questions such as “How successful are the attackers? Do attackers suffer more 
casualties?’ [Ref. 3]. Coban [Ref. 4] used classification trees to build a model which 


predicts the outcome of a potential battle. 


Among the works mentioned above, Hartley’s claimed that nationality factors 
should have an important role in modeling warfare. There is another work on this subject 
which is of interest to us. Prior to the Gulf War, a British analyst, David Rowland, made 
accurate predictions about the results of the war, relying heavily on nationality factors 


[Ref. 5]. 


To do the analyses, we divide the data set into four subsets with respect to the 
time, because the nature of warfare changes as time evolves and battles in these time 
periods have similar characteristics. The first subset, battles before World War I (WWI) 
covers the battles from 1600 to the beginning of WWI. The second subset is the battles in 
WWI, the third subset is the battles in WWII, and the last subset is the battles after 
WWII. We also focus on four countries, the USA, Germany, Britain and Israel, because 


more data are available in the data set on these countries than the others. 


Our first analysis is done with the objective variables, namely force ratio, tank 
ratio, artillery ratio, air force ratio and cavalry ratio. These come from hard data. That is, 
the values for these variables can be actually collected from the battlefield. We use 
boxplots to show the data structure and Wilcoxon’s rank sum test to compare different 
hypotheses relating to the objective variables. We find that the USA has usually 
accumulated great power on the battlefield. Especially in WWII, the air force and tank 
ratio of the USA is overwhelming, almost incomparable to those of their enemies. We see 
either little or no difference between Germany and Britain, and also, we usually did not 
see a Statistically significant difference between the ratios of countries when they won or 
lost. Among the countries, Israel, has the smallest figures for all objective variables, 


except for air force ratio. 
XV1 


We also examine the relative variables. These variables, such as initiative, 
leadership, and training, come from soft data, 1.e., the values of them are decided by the 
judgment of historians. Therefore, they are subjective and are usually avoided in models. 
Our analyses showed that the data set does not have useful information for these 
variables. For most of the battles, neither side was deemed to have an advantage with 
respect to relative variables and the countries had similar patterns. Only Israel has a 
different pattern than other countries. For training, leadership and combat effectiveness 
advantage, they have an obvious advantage over their opponents in the battles they 


fought. 


We used classification trees in our final analyses to see if the nationality factors 
were important. Tree-based modeling is an exploratory technique for uncovering 
structure in data and is useful for summarizing large multivariate datasets. [Ref. 7] Trees 
do not need distributional assumptions, and interactions between variables are 
automatically included in the tree structure. In addition, they are robust to outlying data. 
One of the advantages of tree-based models is that they are easy to read. There are oval 
(non terminal or split) and rectangular (terminal) nodes. Each node contains the predicted 
outcome and the distribution to the child nodes. The split criterion is shown on each 


branch. 


The first model consists of the battles prior to WWI (Figure 1). In this period, tree 
models show that nationality was the most important variable, that is, the first split 
criterion is nationality. The second important variable is force ratio. This model explains 
76 percent of the battles, that is, the model classifies 76 percent of the battles correctly. 
The second model, the battles in WWI, showed that nationality is the second most 
important variable after force ratio. The second model explained 79 percent of the battles. 
The third model was built with the information on the battles in WWII. Nationality is the 
second important variable after artillery ratio. This model also explained 79 percent of the 
battles. The last model, the battles after WWHI, consists of the battles in which Israel 
participated. The only variable that appears in this model is the air force ratio. To 
evaluate the importance of nationality in our tree models, we fit the models with and 
without nationality factors and compared the misclassification rates of the two. 


XVil 


Nationality factors improved the accuracy of the model for the battles prior to WWI, but 
the improvement was insignificant for the models for WWI and WWII and did not appear 


in the last model. 





Figure 1. | Tree Model for the Battles Before WWI. 


Figure 1 is our first model which includes the battles before World War I. 
The most important factor is nationality. If the defender is from one of the 
following, the USA, Britain, the Confederate States or Germany, the 
model suggests predicting a win for the defender. If we were to predict an 
outcome of a hypothetical battle in this period, we could predict the result 
only by looking at the nationality of the countries and we would be correct 
71 percent of the time. After nationality, the single most important 
variable is force ratio. 


Coban [Ref. 4] found that relative variables were the most important factors 
before WWI. Our model for that period, without using any relative variables, explained 
76 percent of the battles versus Coban’s [Ref. 4] 79 percent. This shows that we can 
replace the relative variables with the nationality variable, and still have a pretty good 
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model, at least for this data. This is totally objective, because nationality is known before 
the war starts, whereas the relative variable values are very difficult to determine, and 
vary from analyst to analyst. The models for the other periods did not show the 
nationality variables to be the most important factor. However, combining the results 
from all other analyses with the results of our classification trees, we conclude that 
having sufficient military power on the battlefield is a nationality factor for all four 


countries. 
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I. INTRODUCTION 


A. INTRODUCTION 

When one reads about history, it is mostly the history of wars. It is not too far of a 
stretch to say that wars shaped our history, and will continue to be one of the most 
important phenomena shaping the future of the world. Having this much importance, a lot 
of effort has been, and is being, devoted to exploring “the art of war”. One of the main 
areas of interest has always been predicting the outcome of a battle before the first bullet 
flies. Related to, and probably more important than, this question is “what relates to 
winning?” As discussed below, many researchers, using different tools, have tried to 


answer this question. 


Simulation is one of the tools used to make predictions about potential battles. 
Simulations are often built on mathematical models, such as Lanchester equations [Ref. 
4]. In the past, capabilities of simulations were somewhat limited, but with improvements 
in computer technology, much more capable simulations are available today. In the end, 
though, simulations are simplifications of combat, and it has proven difficult to model 
intangible factors, such as like leadership, morale, training etc. [Refs. 1, 2, 3], which 


according to many other studies, greatly affect combat outcomes. 


Another tool that analysts use to understand the nature of warfare is data analysis. 
The main challenge with data analysis is finding reliable, useful data. Furthermore, the 
data need to be detailed and large enough to find reliable answers. To some extent, we 
also have this problem, but the data set used in this research is considered to be the best 
data set available on historical land battles. In this work, the CDB90G data set is used. It 
is an updated version of the data set consisting of historical data prepared by the 
Historical Evaluation and Research Organization (HERO) in 1983, which includes battles 


from 1600 through the Arab-Israeli wars towards the end of the 20" century. 


In 1983, the U.S. Concepts Analysis Agency (CAA) contracted the Historical 
Evaluation and Research Organization (HERO) to build a data set of historical combat 
comprising 601 battles. CDB90G, the updated version, consists of 657 battles. There are 


up to 152 attributes listed for each battle. 


Numerous people have worked with this data set, including some NPS Master’s 
students. These researchers looked at different aspects of warfare and tried to answer 
different questions. The first analysis was done by CAA, under the Combat History 
Analysis Effort (CHASE) beginning in 1984. Afterwards, Dupuy [Ref. 2] tried to model 
warfare without using advanced analysis techniques and formed the Quantified Judgment 
Model. Hartley built his Oak Ridge Spreadsheet Battle Model, which allows the user to 
predict the outcome of a potential conflict using an Excel spreadsheet [Ref. 1]. Yigit 
looked at the famous rule of thumb that an attacker with greater than a 3:1 force ratio 
wins, and questions such as “How successful are the attackers? Do attackers suffer more 
casualties?’ [Ref. 3]. Coban [Ref. 4] used classification trees to build a model which 


predicts the outcome of a potential battle. 


Coban’s work in particular is interesting since it uses the relatively new data 
analysis method of classification trees to model combat. Tree based models have certain 
advantages over traditional linear models. They are usually easier to discuss and interpret 
than linear models, and the treatment of missing values (NAs) is more satisfactory with 
tree based models than linear-based models [Ref. 9:p. 378]. Being easier to understand, 


the models built can easily be used by people not very knowledgeable in the subject. 


Among the works mentioned above, Hartley’s claimed that nationality factors 
should have an important role in modeling warfare. Prior to his research, the other works 
discussed did not emphasize the importance of nationality factors as much as did Hartley. 
There is another example of the importance of the nationality factors from a British 
analyst, David Rowland. Prior to the Gulf War, among all the predictions on the outcome 
of the operation, his was reportedly the most accurate [Ref. 5]. Rowland used nationality 
factors as his main variable. Before the campaign started, while most analysts were 
estimating that the battle would last for months and cost thousands of allied lives [Ref. 6], 
he predicted that it would be easy, and came up with figures close to what happened. 


These two works, especially the second one, motivated this thesis. 


The hypothesis that this thesis investigates is that a phenomenon called 
“Nationality Factor” exists; that is, every nation has its own characteristics, which also 


affects its military. For example, people think that the Germans have a long military 
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tradition and are good fighters. Indeed, Dupuy estimates that one German soldier had 
more combat effectiveness than two Soviet soldiers [Ref. 2]. Japan has its own fighting 
class, the samurai, who have hundreds of years of tradition, which makes the country’s 
military unique and different from other countries. In Turkey, being a soldier is special. It 
is said that “Every Turk is born as a soldier,” and it is a great honor to die in a battle for a 
person and his family. Many more examples can be found. It is an undeniable fact that 
there is more to winning than having more weapons or superior tactics or perhaps even 
better training. It is interesting to look at battles where the side with apparently less 
power was the victor, not only once, but many times. The recent Arab-Israeli Wars are a 


clear example where an outgunned side (Israel) repeatedly won. 


The purpose of this thesis is to search for the presence of a “Nationality Factor” 
and find its effects, if any using the CDB90G data set. 
B. BACKGROUND 

1. Trevor Dupuy 

A retired U.S. Army colonel, Trevor N. Dupuy, founded the Historical Evaluation 
and Research Organization (HERO), which constructed most of the data set used in this 
thesis. After finishing the data set, he did an analysis using the very same data set. The 
main product was the Quantified Judgment Model (QJM). The formulas in the QJM 
model are only a little more complicated than basic math. A main feature of QJM is the 
use of OLI values. OLI stands for Operational Lethality Index, which is a weapon’s 
maximum effect under ideal conditions [Ref. 2:p. 30]. The effects of the battlefield, 1.e., 
the changes from “ideal” conditions, are represented by different variables. The combat 
power computation is built upon the corrected (the effects of variables included) OLI 
factors and as an end product, the outcome value R (for result), is calculated for both 
sides. If Re—R, (Re: Result friendly, R,: Result enemy) is positive, the model predicts 
that the friendly side wins, and vice versa. Analysis is done to calculate the variable 


values and effects. Nationality factors are not extensively used. [Ref. 2] 


2. Dean Hartley 

In his book, “Predicting Combat Effects”, Hartley analyzed the original HERO 
dataset to determine whether there are any consistent formulae for predicting combat 
effects. The results proved to be positive and were incorporated in a spreadsheet model 


that predicts battle outcomes; including attrition, duration, advance, and victory. 


Attrition, at the gross level, is determined to follow neither the Lanchester Square Law 
nor the Lanchester Linear Law. Instead, it follows a law between the Linear Law and the 


Logarithmic Law. See Equation (1) 


a 0.75 ¢ 0.40 

qe =7 EOE 

d 

Ge __ £075 ¢ 0.40 

ra 

where (1) 


E = enemy manpower, 
F = friendly manpower, 
t= time. 
More extensively than the other works, Hartley used nationality factors as one of 


the more important variables and it appears in many of his computations. For example, 


the predicted log duration of a battle is: p. 95: 
PFLDURA2=.31 + .24*AIRPL + .0000083*STARDAT- 157*TEMP + .00043*RXODP 


- .00047*ABAIYART + .000054*LWIDYART+ .91*ATVAL + .96*DEVAL ; where 
ATVAL: 


if ATTACKER = "Arabs" then ATVAL = 0.5 
if ATTACKER = "Austria" then ATVAL = 0.2 
if ATTACKER = "England" then ATVAL = 0.2 
if ATTACKER = "European" then ATVAL = 0.4 
if ATTACKER = "France" then ATVAL = 0.0 
if ATTACKER = "Germany" then ATVAL = 0.0 
if ATTACKER = "Israel" then ATVAL = 0.3 
if ATTACKER = "Italy" then ATVAL = 0.8 
if ATTACKER = "Japan" then ATVAL = 0.0 
if ATTACKER = "Other" then ATVAL = -0.1 
if ATTACKER = "Russia" then ATVAL = 0.2 
if ATTACKER = "USA" then ATVAL = 0.0 
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By the same token, nationalities have different constants in almost every 


calculation. Thus, they greatly affect the end result. 


Despite the extensive usage and its benefits, using nationality factors is still 
considered suspect. The reason is the different nature of national identity. It is not fixed 
and certainly does change over time. Once it was the Romans who ruled the world, 
France enjoyed military superiority from the time of Napoleon to the Franco-Russian 
War, which they lost. In another example, the Ottomans were the main power for 
centuries and then they became “the sick man of Europe.” [Ref. 1] 

ap Faruk Yigit 

Yigit [Ref. 3] explored CAA’s revised version of the HERO database, the 
CDB9OFT. This dataset consists of 660 battles and engagements with up to 140 different 
attributes on each. Yigit analyzed the 3-1 force ratio rule of thumb, the dispersion rate, 
and the daily casualty rate. He divided the data into chronological subsets and analyzed 
each subset. He concluded that force ratio was a reasonable predictor of outcomes. For 
example, a force ratio of 3 to 1 or greater leads an attacker to victory 68 percent of the 
time. Some of his other findings are that greater dispersion of combat troops is a reason 
for the decrease in casualties despite an increase in weapon lethality, and casualty rates of 
the attacker are almost always lower than those of the defender. 

4. Muzaffer Coban 

Coban [Ref. 4], using the latest version of the data set, CDB90G, used 
classification trees to build models that predict the outcomes of potential battles. Tree- 
based methods may be unfamiliar to some analysts, although many researchers like them 
since they present an attractive way to express knowledge and aid in decision making 
[Ref. 9:p. 251]. Coban looked at pre-selected variables, which he thought had more of an 
effect on the outcome of battle. The pre-selected variables were analyzed to show 
descriptive statistics and conditional plots. The pre-selected variables were: 


e Objective variables: force ratio, tank ratio, artillery ratio, cavalry ratio, 
the attacker’s primary tactical scheme, and the defender’s primary 
defensive posture. 


e Relative variables: relative surprise, relative air superiority in the theater, 
relative combat effectiveness, relative leadership advantage, relative 
training advantage, relative morale advantage, relative logistics advantage, 
relative momentum advantage, relative intelligence advantage, relative 
technology advantage, relative initiative advantage. 


e Terrain and weather variables: three terrain factors and five weather 
factors. 


The descriptive statistics and conditional plots revealed the association of the 
variables with the outcome of battles. The descriptive statistics revealed that the objective 
variables are not highly correlated with victory. Some of the relative variables, such as 
leadership, have a strong relationship with victory. However, relative variables are 


subjective and based on historical judgment. 


Using these variables, three tree-based models were considered. Model 1, with 
only the objective variables, resulted in high misclassification rates. This result was 
parallel to the findings with descriptive statistics, which was that objective variables 
alone are not sufficient to classify battle outcomes. Model 2, with both objective and 
relative variables had relatively low misclassification rates. Model 3 used terrain and 
weather variables, as well as the objective and relative variables. However, the resulting 
classification trees did not include the terrain and weather variables, and the 


misclassification rates were no better then those of Model 2. 


Coban conducted another analysis to understand the historical trends in battles. 
Multiple classification trees were built by using the objective and relative variables with 
training test sizes of 125. Each classification tree was built with a training set size of 125 
and the battle after the 125 battles in the data set was predicted. Then, another 
classification tree was built with the next 125 battles, with an overlap of 124 battles. At 
the end, 658-125=533 classification trees were built and 533 predictions made. This 
analysis revealed some important results. First, the importance of variables has changed 
throughout history. Second, the misclassification rates show that past battles failed to 
predict the battles of World War II, in which new tactics and weapons were introduced to 


fighters [Ref. 4]. 


In his thesis, Coban concluded that: 


The predictions of battle outcomes using classification trees revealed as 
high as 79 percent correct (clear-cut outcomes). This result is satisfying 
when the role of luck in battles and hard to quantify factors are considered. 
[Ref. 4] 


This is the most interesting part, hard to quantify variables, which resulted in being the 


topic of this thesis. 


It is always a challenge to work with intangibles. How can you measure things 
such as leadership or morale, especially before a conflict? Can nationality factors be a 


surrogate for these? 


In the CDB90G data set, there are values for the intangible variables as well, but 
since the purpose is to predict the outcome of a war, we need data before the war, not 
afterwards. However, I have a different point of view. I hypothesize that nations have 
their own characteristics, which are force multipliers. A good thing about this particular 
variable is that although it is a soft factor, the nationality factor, unlike other soft factors, 
is “objective”. The purpose of this work is to ascertain if nationality factors correlate with 
the outcome of battles above and beyond other variables. What are the nationality factors 
and can we really talk about them? If the answer is yes, do they change over time? Can 
we come up with a reasonable method to use nationality factors in predicting the outcome 


of a battle? 
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I. SUMMARY STATISTICS 


A. INTRODUCTION 
This section explores and summarizes the data set using simple analysis 
techniques. Our purpose is to establish a good fundamental understanding of the data set 


before actually doing analysis with classification trees. 


To address the purpose of the thesis, that is to determine the effect of nationality 
on the outcome of a potential battle, the data are analyzed with respect to different 
nationalities. In order to do this, we use different subsets of the data set with respect to 
different nationalities. Variable “nationA”, the nationality of the attacking force, is used 
as the classifying variable. In addition, “nationD”’’, the nationality of the defending force, 


is also used when necessary. 


One of the questions we want to answer is whether nationality factors change over 
time. To address this particular aspect, following Coban [Ref. 4], the data set is divided 
into six different time periods. These time periods reflect important changes in history. In 
each period, war was conducted differently than in the others, in that new technologies or 
new tactics were used. The battles within each period have similar properties. The first 
division is made at 1755, and therefore, the first period is 1600 to 1755. The Thirty 
Years’ War falls within the first period. 1756 marked the beginning of the 7 Years’ War, 
which was the largest of the pre-Napoleonic Wars in the data set. The second period is 
from 1756 to 1814, and includes the 7 Years’ War and the Napoleonic Wars. This was 
the period of great European powers, extensive usage of black powder and big sailing 
ships. 1815 marks the fall of Napoleon, and the beginning of a new era. In this period, 
from 1815 to 1914, a big portion of the data is from the American Civil War. This period 
ends in 1914, the beginning of World War I (WWI). 1914 to 1939 comprise the next 
period. This is mostly WWI, in which warfare changed in revolutionary ways, as many 
new technologies, such as tanks, airplanes and chemical warfare were used. The next 
period, from 1939 to 1945, has data from World War II (WWII). This is the most 
important subset because it has more data on the nations that we are interested in than the 


other subsets and the way battles were fought more closely resembles today’s concepts. 
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Another advantage with this period is that the data is more reliable because record 
keeping was much better than before WWII. The last period is from after WWII to the 


present. 


The number of battles of different countries in different periods in the data set is 


shown in the tables below. For acronyms, see Appendix C. 


1939+ thru] 1945+ thru 
= 5 

















Table 1. Battles Per Period, Attacker. 


RowNames 1600+ thru] 1755+ thru} 1814+ thru] 19134 thru] 1939+ thru} 1945+ thru 
—— 1814 1913 1939 == 2000 




















Table 2. Battles Per Period, Defender. 


10 


Tables 1 and 2 show the number of battles in which countries were involved 
during different periods. The first table contains the numbers for when the country was an 
attacker (nationA), the second when it was the defender (nationD). The countries that will 
be analyzed are highlighted. As an example, the USA has 179 attacks, 94 of them in 
WWII, and Germany has 180 defends, 110 in WWII. 


Tables 1 and 2 reveal a problem. Although the data set includes 657 battles, the 
number of battles decreases dramatically when divided into subsets, making analysis 


difficult. To overcome this problem, the following is done. 


1600 to 1914 is considered as a single period. This follows Coban [Ref. 4], who 
considered the battles prior to WW I as a group. He showed that intangibles are the most 


important factors in this period. 


The names of the following countries are combined. BR and ENG, PR and GER, 
SOV and RUSS. CS (Confederate States) and the USA are not combined, because the 
author considered them different countries since the battles of CS were against the USA. 
This is also in line with the way Hartley analyzed the data [Ref. 1]. Also, again for the 
same purpose, the focus will be on four Nations: the USA, Germany, Britain, and Israel. 
The new tables of battles per period for the four countries we will analyze are in Tables 3 


and 4. 


1600+ 1913+ 1939+ 1945+ 
thru 1913 |thru 1939 |thru 1945 |thru 2000 |total 


TT ee 





Table 3. Battles Per Period, Attacker. 





Table 4. Battles Per Period, Defender. 
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B. DESCRIPTIVE STATISTICS 

In this section, the important variables of the CDB90G data set will be analyzed. 
Fifteen different variables are considered as potentially important, that is, potentially 
affecting the outcome of the battle. This decision follows the selections made by Coban 
[Ref. 4] and Hartley [Ref. 1], and also reflects the author’s military judgment. The 
variables are divided into two subsets of “objective” and “relative” variables. Objective 
variables are those whose values can be collected from the battleground or from “hard” 
data. They are force ratio, artillery ratio, air force ratio, cavalry ratio, and tank ratio. 
These variables can be known before the confrontation and can be agreed upon by 
different people. While the accuracy of this data is suspect [Ref. 1], they are based on 
numbers, so they have the same meaning for everybody. As an example, one tank is one 
tank for all analysts. Therefore, these variables are called objective variables. On the 
other hand, relative variables, leadership, training, combat effectiveness, are totally 
subjective; the values being based on the judgment of military historians. Unlike the case 
with objective variables, it is extremely difficult to decide the values of these before the 
battle, and differences between different people’s figures are almost guaranteed. 
Therefore, they are called “soft” data, and are almost universally avoided in models [Ref. 


1]. We will not be an exception. 


For our purposes, then, objective variables are much more important than relative 
variables. There are other works, Hartley [Ref. 1] and Coban [Ref. 4], which used relative 
variables in their models. We will follow a different method. All variables will be 
analyzed in this section to reveal characteristics of different nations, but after this section, 
the relative variables will not be analyzed again. Instead, we will try to replace all the 
relative variables with just one variable: Nationality. 

1 Treatment of the Data 

In the data set, some relative variables, relative combat effectiveness, leadership, 
training, morale, logistics, momentum, intelligence, technology, and initiative, have 
values ranging from “—4” to “+4.” A value of “—4” shows that the variable very strongly 
favors the defender, while “+4” shows that the variable very strongly favors the attacker. 
A level of “0” favors neither side. The variable surprise is given in a scale between “—2” 
and “+2.” Again, negative values favor the defender and positive favor the attacker. 
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However, it is very difficult to scale these qualities in this much detail. Therefore, 
following Coban’s methodology, we will give those variables only 3 values: “A” for an 
advantage to the attacker, “D” for an advantage to the defender, and “O” for no 


advantage to either side. [Ref. 4] 


Since Coban also used the same data analysis method, classification trees, in his 
models, we will try to follow him when possible, and try to compare our findings with 
his. As in his analysis, weapons effects are expressed as ratios. In some battles, the 
attackers had no weapons of a particular type. This makes the ratio zero, which gives no 
information about the number of the defender’s weapons. In some other cases, the 
defender had no weapons and that makes the ratio infinity. Adding a constant to both 
sides avoids these two pitfalls. Therefore, in finding ratios, one is added to each side’s 
strength. When neither side had a particular weapon system, e.g. tanks, a missing value 


indicator is assigned to the ratio variable. [Ref. 4] 


Descriptive statistics will help understand the properties of objective and relative 
variables and nationalities. Tables, boxplots, barplots and histograms are used when 
informative. 

Zz: Response Variable 

a. Battle Outcome: “WINA” 
The outcome of the battle is expressed in variable “WINA”. A value of 
“1” represents an attacker win, “-1” means that the attacker did not win. Either the 


attacker lost or the historians judged that the battle was a draw. 
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Figure 2. Proportion of Battles Won By Attacker. 


Figure 2 shows the ratio of the battles won to all of the battles fought 

(Battles Won/Battles Fought | Attack). Israel came into existence only for 

the last period, and we do not have any data from Britain or Germany after 

WWII. For this reason, there are some missing bars. Before WWI, the 

Germans won a large portion of the battles when they were attackers, and 

this ratio consistently fell in later periods. The USA’s hundred percent 

refers to battles in the Korean War. 

3. Objective Variables 

As mentioned in the introduction, the objective variables to be analyzed include 
force ratio, artillery ratio, tank ratio, cavalry ratio and air force ratio. All of the countries 
are analyzed when they were attacking. Although analyses with nation defending were 
done as well, they are not presented here because the results were not useful. 

a. Force Ratio 


The basic formula for force ratio is: 
FR=A/D, 
where 
A is the total strength of the attacker in manpower and 


D is the total strength of the defender in manpower. 
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The strength refers to only the combatants, and troops on either side are 


assumed to be identical. The following is a boxplot of the force ratios. 
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Figure 3. Force Ratios of Attacking Countries. 


The first boxplot will be explained in detail. This plot is drawn by the 
“pwplot ()” command in S-Plus version 2000 [Ref. 13]. This function enables us to 
draw boxplots for multiple variables, in this case attackers, on one chart. The force ratios 
are on the X axis, and the names of the countries are on the Y axis. The graphic is divided 
into two sections by a vertical line. Above the section on the left reads the number “-1”, 
which refers to the value of the “Outcome Variable’, “WINA”. As explained in the 
respective section, -1 means that the attacker did not win. So, while the force ratio 
boxplots of the battles that the attacker lost are on the left, the ones that the attacker won 


are on the right. 
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The rectangle-like shapes in the plot are called “boxplots”. In recent years, 
boxplots have successfully been used to describe the prominent features of data sets. 
These features include center, spread, the extent and nature of any departure from 
symmetry, and identification of outliers [Ref. 8]. The point in the center shows the 
median. The width of the rectangle is an indicator of variability, the wider the rectangle, 
the more the variability is, and the width of the rectangle is called the fourth spread, fs. 
Data between the first and the third quartiles (the middle 50 percent) fall in this rectangle. 
The left end of the rectangle (lower fourth) is the median of the smallest n/2 observations, 
and the right end (upper fourth) is the median of the largest n/2 observations. The 
whiskers on both sides have the smallest and the biggest observations, unless there are 
outliers. Any observation farther than 1.5 fs from the closest fourth is an outlier and 


represented as a small circle. 


Force ratio is universally considered to be an important factor in battle 
outcomes. When Figure 3 is examined, it can be seen that there are differences between 
countries. The first difference is their force ratios. The next table has the average force 


ratios of the countries in which they won and lost while they were attacking. 


1s} 12 05 


Table 5. Force Ratio Averages. 





The USA has the highest average force ratio. Israel has the higher force 
ratio in the battles it lost than the battles it won. 


As it can be seen from Table 5, the USA has a bigger force ratio than the 
others. The USA has three times more force ratio than Israel, which always has a smaller 
force ratio than the other three countries. Israel also has less variability. The boxplots in 
Figure 3 are almost symmetric, that is, the distribution of force ratios for a particular 
country when they won and lost, are almost identical, except for the USA. Normally, the 
force ratio in the battles won is expected to be higher than in the ones lost. Wilcoxon's 
rank-sum test is used to see whether the median force ratio of Germany and Britain in the 
battles they won is greater than the ones they lost. Since we are using the whole 
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population, the answer to this question is intuitive and does not require any statistics. It is 
only necessary to compare the medians. Wilcoxon’s rank sum test is being used to see if 
the differences in median force ratio when winning and losing for Germany and Britain 
are indistinguishable from what would be obtained by random samples from the same 
distribution. 

Hy: M,—M, =0 

H,:M,—, >0 


where, 


lt, = Median force ratio when attacking and winning 


ut, = Median force ratio when attacking and losing 


The Wilcoxon’s rank-sum test reveals a p-value of 0.4845 for Britain and 
0.183 for Germany. Both of these values strongly suggest that, at a five percent 
significance level, for both countries, there is no evidence to reject the null hypothesis. 
That is, the medians are the same. In other words, neither of the countries had a 
significantly higher force ratio for the battles they won than the battles they lost. When 
the medians of Germany and Britain’s force ratios when they won are compared, the p- 
value is 0.071. This suggests a difference in their force ratios but the hypothesis that the 
medians of these two countries when attacking and winning are the same cannot be 


rejected at the 0.05 significance level. 


The USA definitely has a bigger ratio and more spread when they won 
attacking as compared to when they lost defending. The final interesting feature is that, 
not only did the USA have a higher force ratio, but it also has many outliers. The 
numerical dominance of the USA on the battlefield and its effects will be discussed later. 

b. Artillery Ratio: “arty” 

arty = Aa/ Ap 


where 
Aa = Number of artillery tubes of the attacker and 


Ap = Number of artillery tubes of the defender. 
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This is the only variable which is present in all periods. Prior to WWI, the 
artillery ratio varied a lot [App. B.B.]. After the start of the 20" century, artillery was 


used extensively. 
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Figure 4. — Artillery Ratio, Entire Data Set. 


Figure 4 shows the boxplots of artillery ratios in the entire dataset. The 
battles with an artillery ratio more than 20 are not included for the sake of 
interpretability. One point is worth mentioning. During WWII, the USA 
has an average artillery ratio of 8.56. This is very much affected by a very 
big advantage, 20.18 in 1944. 


Figure 4 suggests that, like force ratio, there are differences in artillery 
ratios between different countries as well. The USA again had a very large advantage 
compared to other countries, and more so in the battles they won. Like force ratio, many 
outliers can be seen in the USA’s boxplots. Israel has the smallest advantage compared to 


other forces. Britain’s artillery ratio looks higher in the battles they won than the battles 
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they lost. Germany’s artillery ratio when they won and when they lost look similar. 
Again, as we did with the force ratios, the Wilcoxon’s rank-sum test is used to test 


whether the artillery ratio when winning is greater than losing for Britain and Germany: 


The test reveals a p-value of 0.03161 for Germany. This suggests that, at a 
five percent significance level, the artillery ratio when winning is higher than losing, as 


expected. 


For Britain, the p-value is 0.5193 strongly suggests that the median 
artillery ratios when winning and losing are indistinguishable. 

Cc. Close Air Support Ratio: “fly” 

fly = Fa / Fp 


where 
F, = Number of close air support sorties of the attacker and 
Fp = Number of close air support sorties of the defender. 


Close air support is very important in today’s warfare. After armies around 
the world began to use them in combat, airplanes became one of the most important 
factors. Coban [Ref. 4] found that it is the most important variable in wars after WWI. 
Today, the first Gulf War and operations in Serbia have proven that an air force is a 


dominant factor in defining the outcome of a battle. 


Although airplanes were used in WWI, there is so little data in the data set 
that we decided to start with WWII, which is the first war in the data set in which air 


forces played a major role in the outcome of battles. 
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Figure 5. Air Force Ratio, All Dataset. 


Figure 5 contains all battles post-WWL. It is very similar to the plot drawn 
with the data from WWII [App. B.C.], the difference being data from the 
Arab Israeli wars and the Korean War. 


The USA used airplanes much more than other countries. The graph is 
affected greatly by the very high figures of the USA. Therefore, it is difficult to read 
other countries’ boxplots. Unlike artillery ratio, we did not worry about truncating the 
data at a particular point this time because the difference is very large. The USA’s 


overwhelming dominance with respect to the air force is an undeniable fact. 


Among other countries, Israel used its air force more than Germany or 
England. Although countries other than Britain had a bigger air force ratio when they 


won than when they lost, the differences are small. 
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d. Tank Ratio: “tank”’ 
tank= T, / Tp, 


where 
Ta = Number of tanks on the attacker side and 
Tp = Number of tanks on the defender side. 


Just like planes, tanks were used in WWI on a very small scale, but the 


real use of tanks happened in WWII. 
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Figure 6. Tank Ratio, All Battles. 


Figure 6 has battles of the entire data set. It is very similar to the plot 
drawn with the data from WWII [App B.D.]. This plot is also affected by 
the USA’s dominance. 
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To compare countries other than the USA, the following boxplot is used. 
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Figure 7. Tank Ratio, Israel, Germany and Britain 


Figure 7 includes data with a tank ratio of 20 or less. With this truncation, 
12 data points out of 501 are lost from Figure 6. This truncation is 
necessary to be able to compare these three countries. 


According to Figure 7, Germany had a higher tank ratio than the other two 
countries. Britain and Israel’s tank ratios, Britain’s especially, were higher in the battles 
they lost than the battles they won. This variable seems to have different patterns within 


every individual country. 
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Variable Ratio Values - Battles USA Won During WWII 
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Figure 8. _ Ratios of the Objective Variables of Battles in WW2. The USA is the Attacker 
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Figure 9. —_ Ratios of the Objective Variables of Battles in WW2. The USA is the Attacker 
and Loser. 


As Figures 7 and 8 suggest, towards the end of WWII, the USA began to 
have a very big advantage over its opponents. This advantage became very 
overwhelming with “tank” and “fly” ratios. This big advantage over the 
opponents is the main reason for the variability pattern in the charts 
analyzed above. 
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e. Cavalry Ratio: "cav" 
cav = Ca/ Cp 


where 
Ca = Number of cavalries on the attacker side and 
Cp = Number of cavalries on the defender side. 


Cavalry Ratio is present in the data set from 1600 to 1905. 
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Figure 10. Cavalry Ratio. 


While Britain and Germany had similar cavalry ratios when they won or 
lost, the USA had a much bigger ratio when it won than when it lost. The 
USA’s cavalry ratio also has a high variability in the battles where the 
USA won. Again, the USA has a higher ratio than the other countries. 
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The cavalry ratio is the last objective variable to be analyzed. In the next 
section, the relative variables will be analyzed. A general discussion on all of the 
variables analyzed, both objective and relative, can be found at the end of this chapter. 

4. Relative Variables 

Relative variables are represented as categorical variables. As discussed in the 
previous section, relative variables are generally avoided by analysts in their models. 
Although no question exists concerning the importance of these variables, the fact that 


their values depend largely on personal judgment makes them less reliable. 


Another reason for not using them in our classification trees is that the data does 
not have sufficient information on them. This is discussed at the end of this chapter in the 
“Discussion on the Relative Variables” section. However, a preliminary analysis is done 
to see the relationship between nationality and these variables. At this point, it is worth 


remembering our goal: to replace the relative variables with one variable: nationality. 


In the following tables, the letter “A” denotes the battles in which the attacking 
side had an advantage, “D” denotes an advantage for the defending side and “O” means 
there was no advantage on either side. For example, if the “SURPA” is “A” for a 
particular battle, it means that, in that battle, the attacker had the “Relative Surprise” 
advantage, “D” says the defender had the advantage, and “O” says neither had the 


advantage. To familiarize the reader, an example table is explained below: 


2D 


The “COUNTRY” column shows the name of the country. The “OVERALL” 
section of the time periods represents the whole dataset. 


The cells having a number higher than 50 percent are highlighted, which makes it 
easier to see higher figures and patterns in the data. The cells where a corresponding 
figure is not available in the data set, for example, “ISRAEL” does not have any battles 
prior to 1946, and are simply marked with “na”. 









Surprise Advantage Attacker Loses 






Name of the Variable 
The Rol¢ of the Country 


0.14 


Now, we will explain how to read this table, using an example cell. The cell above says 
that, in WWII, among those battles in which Britain was the attacker and loser, the 
attacker (Britain) had the “Surprise Advantage” 14 percent of the time. 
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The relative variables considered important are analyzed below. First, we 
highlight the ones that appeared to be more important than the others. These two 
variables were also found important by Coban [Ref. 4]. 


a. Relative Surprise: “SURPA” 
—_ — — — — 


0.20] 0.00 Deka | 039 0.00 
| 0.34] 0.00(NORIS) 0.25] 0.00] 0.45 


a 
ER 0.431 0.00) 0.21] 0.00) 0.43 
roae}codre papa pa free fee 0.45 0.00 


Table 6. | Surprise Advantage Attacker Wins. 





For most of the battles, regardless of nation or period, there was no 
advantage on either side. Significant ones are highlighted. It is worth noting that in all the 
battles the attacker won, the defender never had a surprise advantage. 

b. Relative Initiative Advantage: “INITA” 

This is one of the more important variables. It can be said that, among all 
the relative variables, this is the only one with consistently significant values. One 
interesting point is that the attacker had an initiative advantage in more than 75 percent of 
the battles it won in all of the subsets except for one. Germany had an advantage in 64 
percent of the battles during WWI. The defender never had an initiative advantage, zero 


percent of the time, when the attacker won. 


OVERALL —— — 1989-1945 1945-2000 


OTA| DOTA; DITOTATD 
000) 022 | .00fna_[na__|na_| 
| 0.00jna_| 


EE 





Table 7. —_— Relative Initiative Advantage, Attacker Wins. 


SURPA and INITA are the two of the most important variables in the data 
set. Other variables will be discussed with the help of Table 8. This table includes all of 


the relative variables used in the analysis. Time periods are not as detailed as in the 


21 


individual tables like Tables 7 and 8. Tables of individual variables are in Appendix A, 
and they will be referred to when necessary. Again, all of the figures of Table 8 are from 


the battles where the countries were attacking. 


In Table 8, the values in the cells, as in previous tables, are the 


proportions. To make the tables easier to read, the cells are formatted as follows: 
DSC: If the cell contains a value greater than or equal to 0.8 


: If the cell contains a value between 0.5 and 0.8 


: If the cell contains a value less than 0.2 


Table 9 contains the exact numbers for each cell, instead of ratios. Figures 


in each cell refer to the number of battles. 
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Ratio of All Relative Variables. 


Table 8. 
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Table 9. All Relative Variables with the Number of Battles. 
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Tables 8 and 9 summarize all relative variables for our four countries 
while attacking. There is no time segmentation. All the sections of the table reveals 
figures from the entire dataset. While Table 8 contains ratios, Table 9 shows the exact 


number of battles in each cell. 


The rest of the relative variables, those with less significance, are listed 
below. The detailed time period tables are provided in Appendix A. 

(1) Relative Combat Effectiveness: “CEA”. Until WWII, when 
the attacker won, the defender never had a combat effectiveness advantage. In WWII, in 
Britain’s battles, the defender had this advantage in 53 percent of the battles and Britain 
still won. Israel had this advantage even in the battles it lost (86 percent). [App A.B.] 

(2) Relative Leadership Advantage: “LEADA”. Until WWI, in 
more than half of the battles, the side with this advantage won. In WWI and WWII, 
neither side had a significant advantage. In the battles Israel fought, the defender never 
had a leadership advantage. [App A.D.] 

(3) Relative Moral Advantage: “MORALA”.. There is no 
significant moral advantage on either attacker or defender side except the USA. In WWI, 
the USA had relative moral advantage in all battles. [App. A.F.] 

(4) Relative Logistics Advantage: “LOGSA” None of the 
nations had a significant logistics advantage in any of the battles [App A.G.] 

(5) Relative Momentum Advantage: “MOMNTA”. The only 
significant advantage is on Germany’s side in WWII. Germany had a momentum 
advantage in 65 percent of the battles in WWII where it attacked and won. Also, in the 
entire data set, the defender never had the momentum advantage. The exception is US 
battles in WWII. In six percent of the battles, the USA lost when attacking and the 
defender had a momentum advantage. [App A.H.] 

(6) Relative Intelligence Advantage: “INTELA”. There is no 
significant advantage on either side. The only exception is Germany in both World Wars. 
It is not very significant, but when Germany attacked and won, it had an advantage in 36 


percent of the battles in WWI and 40 percent in WWII. [App A.L.] 
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(7) Relative Air Superiority: “AEROA”. This variable 


determines the quality of the air force. The USA and Britain had this advantage in a large 


portion of the battles they lost, but not so large in the battles they won. Israel had this 


advantage in 86 percent of its battles 
GENERAL DISCUSSION ON RELATIVE VARIABLES 


In this section, we look at the relative variables together. 


C. 


Looking at Table 8, three trends can be easily seen: 


The defender hardly ever had an advantage over the attacker. The 
countries we analyze are the attackers. Only 7 out of 80 cells belonging to 
the defender have values of more than 20 percent, the highest being 31 
percent. Thus, can we say that this data suggests these countries always 
fought with the countries possessing inferior qualities? Not really, because 
they also fought with each other. However, interestingly enough, 
according to the data, for all variables with the exception of “Initiative 
Advantage’, there is often no advantage on either side. In more than 50 
percent of the battles, neither side had the advantage, except for those of 
Israel. 


Israel has different values than other countries analyzed. For the variables 
“CEA”, “LEADA” and ”TRNGA”, Israel had an obvious advantage over 
the defender. However, interestingly enough, there is no significant 
difference between Israel’s degree of advantage when they won or lost. 
For example, they had a Leadership advantage in 57 percent of the battles 
both when they won and they lost. And, they had Combat Effectiveness 
advantage in all of the battles, 100 percent, when they won and 86 percent 
of the battles they lost. 


“Tnitiative Advantage” is the only variable where the attacker consistently 
had an advantage over the defender. This results from the fact that the 
attack is often done to seize the initiative. “Offensive operations are the 
means by which a military force seizes and holds the initiative while 
maintaining freedom of action and achieving decisive results. This is 
fundamentally true across all levels of war.” [Ref. 11]. In other words, the 
attacking side has the initiative advantage almost “by definition”. 


If the two exceptions discussed earlier, Israel from the countries and 
“Tnitiative Advantage” from the variables, are put aside, in more then 50 
percent of the battles, there is no advantage on the either side. As an 
attacker, Britain had a Combat Effectiveness Advantage in 48 percent of 
the battles, which is the only exception. This also supports our earlier 
claims. To decide the values of relative variables is so difficult that even 
the historians could not find an advantage on either side in more than 50 
percent of the battles. 


a2 


D. SUMMARY 


In this chapter, the variables that are considered to be important were analyzed 


with respect to the countries. Other than the results discussed in the previous sections, 


some other important considerations are given below: 


It is important to note that, although this data set is the best data set on 
historical land combat, it is not at all perfect. It would be a serious mistake 
to accept this data as the ultimate truth because: 


1. The data was collected by military historians. Therefore, the battles 
listed in the data set are decided upon their comfort level. They are 
not all of the battles fought, nor are they necessarily the most 
important ones or a random sample. It may be the case that they 
did not have sufficient data on many very important battles and 
therefore ignored them. 


2 The countries we focus on, the USA, Britain, Germany and Israel, 
are usually considered to be successful on the battlefield. We are 
forced to do this because these are the ones for which sufficient 
data exists. They all have similar kinds of properties: Extensive use 
of technology, a large economy behind the war machine and 
extensive experience. Some may argue that Israel does not fall into 
that category, but compared to their enemies, the difference is 
obvious. As a result, it is hard to find the advantages caused by 
nationality. 


All of the battles that were in the data set have only one nation as the 
attacker and the defender. It is a fact that this is not the case in many 
battles. There were and still are alliances. This appears to be another 
limitation of the data set. 


An interesting point discussed before is the fact that the USA had a huge 
power on the field towards the end of WWII. It really is difficult to 
analyze the nationality factor of the USA. It can be said that accumulating 
a big power on the battlefield and outnumbering the enemy is a main 
characteristic of the USA, but this could only be decided upon for certain 
with the support of military historians. 


When the objective variables were analyzed, all countries had different 
characteristics. Had they had similar properties, it would have been easier 
to determine the effect of nationality. However, in our case, one may not 
be able to decide whether it is the objective variables or the nationality 
factors that affects the outcome. 


Israel’s smaller values than the others, smaller force ratio, artillery ratio 
etc., suggest that the way in which wars are fought has changed. Even 
fewer weapons can and do provide more lethality. 


es) 
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I. CLASSIFICATION TREES 


A. INTRODUCTION 

In this chapter, classification tree models will be analyzed. First, the reader will be 
informed about tree-based modeling, a relatively new analysis method. Second, we will 
introduce the tree models built using the CDB90G data set. A discussion on the models 


built, and further study suggestions, will conclude this chapter. 


Tree-based modeling is an exploratory technique for uncovering structure in data. 
Specifically, the technique is useful for classification and regression problems when one 
has a set of classification or predictor variables (x) and a single response variable (y). 
Tree-based models are relatively new, but are gaining widespread popularity as a means 
of devising prediction rules for rapid and repeated evaluation, as a screening method for 
variables, as a diagnostic technique to assess the adequacy of linear models, and simply 


summarize large multivariate datasets. [Ref. 7] We will use them for the latter purpose. 


Trees simply show the structure of the data. Trees do not need distributional 
assumptions, and as such, transformations are not needed. Any interactions between 
variables are automatically included in the tree structure. Furthermore, they are robust to 


outlying data. 


Trees are arranged hierarchically. Until a terminal node is reached, the data 


flowing down the tree encounters one decision at a time. 


One of the advantages of tree-based models is that they are easy to read. There are 
oval (non terminal or split) and rectangular (terminal) nodes. Each node contains the 
predicted outcome and the distribution to the child nodes. The split criterion is shown on 


each branch. 


For example, Figure 11 shows a tree model built on the entire data set. The root 
node says that there are 657(260+397) data points in the data set. 260 of them are the 
ones where the attacker did not win and 397 times the attacker won. The first split is 
determined by which country is attacking. If the defender is Britain, the Confederate 
States, Israel, or the USA, we go to the left node (a terminal node) and if the defender is 


one of the other nations, Austria, Egypt or the other ones mentioned in the right branch 
DD 


go to the right node (which is a split node) on the right. According to the terminal node 
on the left, the tree model predicts that the attacker does not win, i.e. it could be a loss or 
a draw. At the terminal node on the left, there are 156 observations, 87 of which are “-1” 
for attacker losses, and 51 are “1” for attacker wins. If we go to the right branch, we 
reach a split node. In that split node, there are 173 observations with a WINA value of “- 
1” and 346 observations with a WINA value of “1”. At that node, the question is “What 
is the force ratio?” If the force ratio is less than 5.38194, then the attacker is predicted to 
win. This is the terminal node in the middle. Again, 171 refers to the number of “-1’’s in 
that node, and 318 refers to “1”s. If the force ratio is greater than 5.3819, the right branch 
is chosen, which also suggests a win for the attacker. However, as the reader can 
recognize, although both of the terminal nodes, the middle one and the one on the right, 
suggest a win for the attacker, the misclassification rates are different. Now, we will 
discuss the algorithm behind the tree models and how the splits are decided and the tree 


built. 
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Figure 11. Tree Model of the Entire Data Set. 


The tree models are fit by binary recursive partitioning, by which the data set is 
successively split into increasingly homogenous subsets. [Ref. 7] The usual set-up for 
regression, or classification if the response variable is categorical, trees is as follows. The 
n responses, in our models the variable WINA, y,,..., y,,and the predictors x; are collected 
for each y;, Starting with all y’s in one node, the impurity of that node is measured. 
Impurity can be one of several different measures, deviance or Residual Sum of Squares 
(RSS), Mean Sum of Squares (MSE) etc. Both S+ and the rpart algorithm measure 
impurity by deviance. For more information on deviance, see Devore pp. 502-503 [Ref. 
8]. The objective is to divide the observations into sub-nodes of high purity, i.e., have as 


many similar y’s as possible. So, at a node (a “‘split’), if the data is categorical, the data is 


Oe 


divided into two subsets one including some of the categories, the other with the rest, 
e.g., we might separate a group of nations from the others. If the data is continuous, every 
possible split of the form X<a is considered. The criterion by which the split is decided is 
called the split criterion. Then, the impurity (RSS) is computed for each of the two 
groups. The split decreasing the impurity most is chosen. [Ref. 12]. This process of 
splitting can continue down to every single observation, which would be over-fitting. In 
order to avoid this, the tree construction continues until the number of observations in 
each node is small, by default n;<20 for rpart, or the leaf is sufficiently homogenous, 


1.e., with small impurity. 


There are several tree methods available. Since there are many missing values in 
the data set (Table 10), we prefer to use the rpart [Ref. 10] method because of the way 


it handles the missing values. 


[PERIOD_[ i | arty | fly | cav_| tank | Total Number of Battles 
Before wwI] 0 | 95_| 251] 107 | 251 | 251 
wwr | _o | 46 | 126 | 130 | 101 | 133 


Afterwwih| o | 2 | 2 | 7 | 4 | 80 
ToTaL__| 1] 60 #60 | 509 | oor] SS 





Table 10. Number of Missing Values. 


This table gives the number of missing values for the objective variables 
used in building the tree models. It is important to note that tanks and 
airplanes were not present before WWII and they were used on a very 
small scale during WWI. Also, cavalry was used mainly before WWI. 
Therefore, some of the big numbers are basically historical facts. 
However, even taking this into consideration, a large missing value 
problem exists forcing us to use rpart. 


In rpart, when missing values are encountered in considering a split, they are 
ignored and the probabilities and impurity measures are calculated from the non missing 
values of that variable. Surrogate splits are then used to allocate the missing cases to the 
daughter node. Therneau [Ref. 10] contains some more detail about the usage of 
surrogate splits in rpart. The next two paragraphs briefly explain the use of surrogate 


splits. 


38 


Once a splitting variable and a split point for it have been decided, what is to be 
done with observations missing that variable? One approach is to estimate the missing 
datum using the other independent variables. rpart uses a variation of this to define 


surrogate variables. 


As an example, assume that “Force Ratio <2” has been chosen as the split 
criterion and there are data points missing information on the force ratio. The surrogate 
variables to be used for the data points which are missing the value for the force ratio, are 
then found by re-applying the partitioning algorithm (without recursion). The two 
categories “Force Ratio <2”, “Force Ratio >2” are predicted using the other independent 
variables. For each predictor, an optimal split point and a misclassification error are 
computed. The surrogates are than ranked. Any observation that is missing the split 
variable is then classified using the first surrogate variable, or if that is missed, the second 
surrogate is used, and so forth. If an observation is missing all surrogates, the blind rule 
of “go with the majority” is used. Other strategies for these “missing everything” 
observations can be argued, but there should be few or no observations of this type. [Ref. 


10] 


Another issue with tree models is pruning. After building the model, it is usually 
the case that it is over-fitted. [Ref. 12] The trees are built in order to minimize the 
impurity. In doing so, the trees grow too big. In other words, the models are too good. 
This, of course, decreases the model’s ability to predict. Pruning is used to solve this 


problem. 


The question we ask then is whether we want a predictive model or a descriptive 
(explanatory) one. As mentioned above, trees can be used for both. If it is a predictive 
model, pruning to the optimum size using cross-validation is vital. However, for 
descriptive models, in other words, models to explore the data, pruning is not that great of 


a concern. The tree explaining the data best is used. 
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One of the problems faced in this situation is lack of sufficient data to build 
predictive models. When trees we built are pruned to the optimum size, they become too 
small to be used in predictions. As a result, we will use tree models to describe the data 
and explore the nationality factors in the data set. For this reason, pruning will not be our 
concern. All the models built are descriptive models. 

B. TREE MODELS 

In this section, the tree models that were built are examined. Trees were built by 
using the battles in which the countries that we are analyzing, the USA, Britain, 
Germany, and Israel, appeared either as an attacker or a defender, and only the objective 
variables were used as predictive variables. Since not all of the variables are present in 
each period, only the appropriate ones, whose names are given in the sections where the 
trees described are used to build the models. 

1. Model 1: The Battles Prior to World War I 

This is the model for the battles before 1910, see Figure 12. The model shows that 
nationality was the most important factor affecting the outcome of the battle. In other 
words, if only one variable were allowed, we would choose: “What is the nationality of 
the attacker?” Three of the countries in which we are interested appeared at the first split. 
According to the model, the USA, Germany and Britain tended to win the battles in 
which they were defending. This was correct in 47 of the 69 battles in which they were 
defending. 

De Model 2: The Battles of World War I 

This is the model for the battles of WW I, see Figure 13. According to the model, 
the most important factor was force ratio. The second most important was nationality. 
However, it is important to note that the split criterion for the force ratio is 4.05. There 
are 15 battles where the attacker had a ratio at least 4.05 and the attacker won them all. 
Out of these 15, 10 were from the USA, 3 from Germany and 2 from Britain. This, again, 
leads us to the same question asked previously: Is it the nationality or the objective 


factors that have the real effect? 
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3. Model 3: The Battles of World War II 

During WW II, the most important variable was artillery ratio. The second most 
important was, as in WWI, nationality. The USA, Germany and Britain again appear in 
the second split. They won the battles in which they were defending against an attacker 
who did not have sufficient artillery support. 

4. Model 4: The Battles that Israel Fought 

This particular model follows our historical segmentation. This model contains 
the battles fought after WWII, but instead of including all the battles, we focused on 


Israel, and tried to ascertain if nationality factors are important. 


4] 
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Figure 12. Model | Battles Before World War I. 


Model | explains 76 percent of the battles. That is, the terminal nodes correctly 
classify the outcome 76 percent of the time. The most important factor is 
nationality. The USA, Britain and Germany appear in the first split as 
defenders explicitly and as attackers implicitly. Implicitly, because, these are 
the battles of those three countries. When other countries are defending, they 
are the attackers. If we were to predict an outcome of a hypothetical battle in 
this period, we could predict the result only by looking at the nationality of the 
countries and we would be correct 71 percent of the time. If the USA, Britain 
or Germany is either defending or attacking, they win. With the exception of a 
few draws, a value of “-1” refers either to a draw or a loss for the attacker. 
After nationality, the single most important variable is force ratio. Other 
variables present at this period, artillery ratio and cavalry ratio, did not appear 
in our tree. The first split including force ratio is 1.08, which is interestingly 
small. This reminds us of the findings of Yigit [Ref. 3], and his work on the 3 
to 1 force ratio rule-of-thumb. As can be seen from the force ratio boxplots 
[App. 1.1], before WWI, the force ratios are small and there is no significant 
difference between the force ratios of the countries analyzed. This helps us to 
understand two things: (1) As we claimed before (Chapter II Conclusions) 
since the countries have similar properties, it is easier to decide whether 
nationality has an important effect. Also, in this case, the tree decided that 
nationality is the primary factor. (2) The split criteria related to force ratio are 
small because all countries had similar force ratios. 
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Figure 13. Model 2 Battles of World War I. 


This model explains 79 percent of the battles. The most important factor is 
force ratio. The second most important factor is nationality. In this period, 
force ratio began to be much more important on the battlefield. Also, now, 
unlike the battles prior to WWI, the threshold is much higher. As the reader 
may recall, the first threshold for force ratio in the battles prior to WW I was 
1.08, see Figure 12, as opposed to 4.06 in WWI. This is mostly because of the 
USA’s high force ratios. 10 out of 15 observations in the terminal node on the 
right is from the battles of the USA. Also, this is not surprising, considering the 
fierce defenses of that era. It takes mere power, i.e., force ratio, to defeat the 
defender. Another point, though, all 15 battles that have the large force ratio 
have the tree countries as the attacker. Again, how can one decide whether it is 
the force ratio or nationality that affects the outcome? Second and third splits 
are nationalities. If the USA is attacking, they won even with a force ratio less 
than 4.06. Britain and Germany are less successful at attacking, but the 
Germans were better defenders than the Britons. The USA is good at both 
defending and attacking. 
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Figure 14. Model 3, Battles of World War II. 


This model explains 79 percent of the battles. The most important variable is 
artillery ratio. Technology and advanced weapons began to play a more 
important role on the battlefield. In the battles where the attacker did not have 
an artillery ratio advantage, the second most important factor is nationality. In 
those battles, the USA, Britain and Germany won as defending armies. In the 
battles where the attacker has an artillery advantage, the second most important 
variable is tank ratio. An advantage of 3.7, along with an artillery advantage of 
1.3, almost guaranteed the attacker’s victory, 64 out of 79 battles. When the 
tank ratio is smaller, Britain won the battles where they attacked, while 
Germany and the USA lost as attackers, if they do not have a tank ratio of 1.9 
or an artillery ratio of 3.15 or more. To summarize, all three countries are good 
defenders; Britain is a better attacker when they have less power than Germany 
and the USA. However, the importance of weapons appears much higher than 
in the previous time periods. 
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Endpoint = WINA 





Figure 15. Model 4, Battles that Israel Fought. 


This simple model explains 81 percent of the battles. The importance of 
advanced weapons is still increasing. The only important variable is air 
force ratio. However, we know from the data set that Israel won 82 
percent of the battles they attacked. Again, as we claimed previously, see 
Chapter II, Conclusions, the countries we are analyzing are those already 
using the more important factors, the decisive variables such as artillery 
and tank ratio in WW II or air force ratio after WW II. Thus, it is difficult 
to decide where nationality has a role, or what is a nationality factor. 
These issues will be addressed at the end of this chapter. 


After the models are fit, the question is how good are the models, or how 
important are the nationality factors? For a predictive model, there are a couple ways to 
ensure quality. One is _ cross-validation, which is used with rpart’s 
“orune.rpart ()” command. This was tried and proved ineffective because of a lack 
of information. The shortage of data, as discussed above, was the main reason for 
building explanatory models rather than predictive ones. Another way to build better 
trees is by dividing the data in random subsets, and then building the model using one of 


the training set subsets. After building the tree, it is evaluated with the rest of the data. 
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Doing this with different subset variations until a good tree is built is another approach to 
build good models. However, we have the same problem with cross-validation: 


insufficient information. 


Our models, as mentioned previously, are explanatory as opposed to predictive 
models. Thus, there is another measure on which we can assess our models: the 


misclassification rate. 


The misclassification rate is the measurement of what percentage of the data can 
actually be explained with the model. We will use it as our measurement to assess the 
models. Our models with the nationalities were presented in the previous section. To 
evaluate the importance of nationality, models without the nationality factors were also 
built. They will not be presented, but instead, misclassification rates with and without 


nationality will be compared. The next table contains those values. 

















WITH NATIONALITY WITHOUT NATIONALITY 
BEFORE WW1 0.244 0.315 
DURING WWI 0.212 0.250 
DURING WWII 0.216 0.219 














Table 11. Misclassification Rates of the Trees with and without Nationality. 


As can be seen from the table, the effect of nationality, the change in the 
misclassification rate with nationality, was largest before WWI. That is 
not a surprise, since nationality was the primary split in that period. An 
approximate 7 percent improvement in the misclassification rate occurred 
when nationality is used. During WWI, the improvement was 3.8 percent. 
Beginning with WWII, the nationality variable began to be a rather 
unimportant factor. 


C, SUMMARY 

In this chapter, we analyzed the data set using the classification trees. Some of the 
important conclusions reached follow. 

e Nationality is the most important variable prior to WW I. 


e There is an obvious trend in history related to predicting the outcome of 
the battle, that is, as time passes. Technology and advanced weapons play 
a more important role in deciding the outcome of the battle. Force ratio 
was the decisive factor up to WW II, but artillery and tank ratios were in 
WW II and the air force ratio after that. However, as demonstrated, the 
countries we analyzed, the USA, Germany, Britain and Israel usually use 
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those weapons more effectively than the others. This, their consistent 
ability to use the most effective weapon systems, is their characteristic. 
Thus, even if the exact figures about their force structure are not available, 
it would not be wrong to predict that they have enough to win the battle 
they are fighting. 


The USA will almost certainly have an overwhelming force on the 
battleground to ensure they win. 
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IV. CONCLUSION 


The analyses produced some interesting results. As we mentioned in the 


introduction chapter, our purpose was to find the importance of nationality on battle 


outcomes. For the analyses, we did the following: 


The analyses focused on four different countries: The USA, Germany, 
Israel and Britain. 


Since the nature of warfare evolves, the data set is divided into four 
periods: battles before WWI, WWI, WWII and the battles after WWII. 


By combining our findings from summary statistics and the tree models, we 


conclude the following: 


Relative variables are avoided internationally and it is not a good idea to 
use them in the models. The reason for avoiding them is that they are 
subjective and hard to determine before a battle. In addition, we also found 
that the data set does not contain much information on the values of these 
variables. That is, according to the data, in the majority of the battles 
neither side has an advantage. In other words, even if one decides to use 
the relative variables in a model, it will be difficult to find discriminatory 
information in this data set. 


The tree models show that nationality was the most important factor in the 
battles before WWI. This is in line with the findings of Coban [Ref. 4], 
who found that in the battles before WWI, the relative variables are more 
important than objective variables. Here, we are using nationality as a 
surrogate for the relative factors. In this thesis, one of the questions we 
asked was whether we can replace all relative variables with just 
nationality. Also, as the results demonstrate, we can replace the relative 
variables with nationality alone, when relative variables are important. 
Coban’s model for the battles before WWI has a misclassification rate of 
21 percent, and ours has 24 percent. Although the analysis methods have 
minor differences in that his is a predictive model whereas ours is a 
explanatory one, the comparison provides a very good indication of the 
soundness of our model. 


The importance of weapons and technology has been increasing since the 
beginning of the 20" century. Also, the countries examined made 
consistent use of weapons and technologies, which affect the outcome of 
the battle. Therefore, even though we cannot determine the exact 
importance of nationality by examining the results of our tree models, we 
can conclude that when combining them with other analyses, the four 
countries, the USA, Germany, Britain and Israel are expected to have 
sufficient weapons on the battleground to win the battle. Considering the 
amount of the data existing on the battles of the USA, it is easier for us to 
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reach a conclusion about the USA’s nationality factor. That is, the USA 
almost always has had an overwhelming military power and this is a 
national characteristic of the USA. Looking at recent combats, this 
conclusion seems to be solid, even more so today. 


Although we conclude that it is the objective variables that are more 
associated with the outcome of the battle, we cannot say that an advantage 
in these guarantees success. The analyses in the second chapter showed 
that in most of the battles, no statistically significant difference exists 
between the relative variable values in the battles won or lost. This leads 
us to one truth about the phenomena of warfare: in war, luck and some 
other factors that can never be predicted nor can even be named, have a 
very big influence. 


A. FURTHER STUDY SUGGESTIONS 


We used only the variable nationA, the nationality of the attacking 
country, as our response variable in the analyses done in Chapter II, 
Summary Statistics. It will be helpful to see what the results are also using 
nationD, the nationality of the defending country. 


Although S-Plus is a very powerful software package, it does have some 
limitations. Several new algorithms related to classification trees are 
available in other software packages. For example, with the methods 
available in S-Plus, each split has only two branches, but, in Clementine, 
the user can decide the number of branches at each split. It will be 
interesting to see what the results are if splits are forced on each nation, in 
other words, have the tree grow in such a way that every branch from a 
split has a different nation. 


With specific countries, further analyses can be done in more detail by 
using other statistical analysis techniques. For example, with the amount 
of data available, it is possible to analyze the battles of the USA and find 
their specific characteristics. Then, combining the results from these 
analyses, tree models can produce more significant results. Using cluster 
analyses might also be a good choice to analyze the data set. 


We talked about the data set and this data set not being the ultimate truth 
(Chapter II, Section D, Summary). Beyond that, in the analyses, we 
considered all battles equal, which in our opinion, is a pitfall. Battles are 
different, with respect to their size, or the importance of their results. In 
the same data set, a different selection of battles among all the others can 
be made. Professional help from a historian might be useful to do this. For 
example, having more homogeneous subsets and discarding the battles 
with an unreasonable force structure such as the ones in WW II in which 
the allies had an incredible advantage over Germany, may help reach 
better conclusions. 
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APPENDIX A. TABLES OF RELATIVE VARIABLES 


In this section, tables for relative variables analyzed in the summary statistics 
section are provided. There are four tables for each variable, all with respect to the time 
periods. The first two tables are for the battles where the countries attack and win, the last 
two are for the battles where they attack and lose. The first and third tables have the exact 
number of battles for all countries. The second and fourth tables are with the four 
countries we analyze, and have the proportion of the battles’ data. The reader may refer 
to p.26 for further explanation on how to read the tables correctly. 


A. “SURPA” 


SURPRIZE ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 __ 1945-2000 
RowNames| OP ATOPOPTATOTOLALOPOLALToOPoOLAd | 
AUS | 6 | 4 | OF BT 4 OOF 3} oO} Of Of of oF of oo | 
a [i 2a BS ie a0 ee eos oo a) 

ce | oO | o| of o| 0] 0 | 


=a 


po foto} oi 
a on 21/10] of 4|1}ofotolofol{ol]o| 


en 22) ao (ee (oe Pe E07 Pes a Eee 
Is eT BT Of of; of; oto} of] of of of of 16] 13] 0 | 
oof 36 | 26] of 32; 15} ot4] 5} ofotitofol so | 
Eon Eee Od ESS Pea os RON MON 0s Hea eos ad en Rea ao 
RUSS {| Of 2;ofototofo|2{}ofoftotofzol oo 
Sov. [16] 6 |ofo}to}ofolt1] of] s|ofol oo | 
USA 88 [22] of 6] 4 | oi] 2] of ss] 6 of oj} oj} ol 


OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


COUNTRYT OT A|OPOTATOPOVATOPOTAToOTOTAd | 
| 0.20] 0.008080] 0.20] 0.00 RNS 0.39] 0.00 FRG} 0.10] 0.00 
| 0.34] 0.0080] 0.25] 0.00] | 0.00 NORA 0.29] 0.00 





BR 
ER | 0.43] 0.00R@RM8} 0.21] 0.00) | 0.00] | 0.00) 
| 0.45] 0.00) | 0.45] 0.00) 
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SURPRIZE ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOTATOPTOPLATOPOPLALToOPTOLALD | 
AUS | it 71 | OF 7] 1] ot4|o;otototoazoto| oa | 
BRL 23] 5] 3 E 9 to] 2s} of oye tit ofot ool 
cs__| tif 6 | 0) fii] 6 {| of o;oftofolj]ofofol oo. 
ENG [| 27a} ovat it opotofofototofyot oto. 
O_o pis} 1} 2t7}]o0lofoltoj}ofo]o]o| 


rasa 08 | 0.00) [i 2 a cea 
REINO Foal 100 Ina__|na__|na_| 
| 0.00} 0.33] Ina__{na_|na_| 
[0.00] 0.00 | 0.00} 0.00] 





B. “CEA” 


RELATIVE COMBAT EFFECTIVENESS ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOLATOPTOPLATOPTOLALToOTOLAL dD | 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


OUNTRYP OFT ATOPOTATOPTOLATOPTOTAToOToOTAd | 
| 0.15] 0.00 ROO] 0.00] 0.00) Ina__|na__|na_| 


BR | 0.00] | 0.00] 0.47] 0.00}N@RSB§na_ [na |na_| 
| 0.49] 0.00 RN@RBS) 0.37] 0.00) | 0.00] | 0.00fna_|na_[na_| 
IS__ 0.00 FROG] 0.00fna__|na__na_| na_|na_|na_] 0.00 0G} 0.00 
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RELATIVE COMBAT EFFECTIVENESS ATTACKER LOSES 

OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 
RowNames| eo AT OA OP OA OOP AD 
AUS | OT OT BEST OT STAT OT of ot; ofofoltolo 
BR ert th Pit of oft +{ot4{of;{sfototoa 


| 0 
| 0 
| 0 
| 0 


aE ERE EE 
bee GG 
HEE G&G 
Bea GG 


| 2 
USA 44] 3 4 et ot iP ef of 3 p22] 3 {of 0 | 


OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


POTATOPOVTALTOPTOTALToOPoOTALToOToOPAtTd 
| 0.00] 0.06,N@RGM] 0.00] 0.33 FN@NGS] 0.09 0.29fna__[na__|na_| 
BR | 0.00] 0.07 5NGR92] 0.08] 0.00RNORSM] 0.00] 0.43fna__[na__|na_| 
| 0.00] 0.33 5N0888] 0.17] 0.0ORNGISS] 0.30] 0.04fna__|na__|na_| 
Bb0-00)na na [nara [pa [nara [na [na 0.14 L_ 0.00] 








Cc. “AEROA” 
AIR FORCE ADVANTAGE ATTACKER WINS 


OVERALL 1939-1945 1945-2000 


RowNames}| OP ATOPTOLPALoOTOLTA D | 
AUS | OF} OF OF O} OO} OF 0} 0 | 





AIR FORCE ADVANTAGE ATTACKER LOSES 


1939-1945 1945-2000 


OVERALL 


<x 


wl 
Bl 
E 
o 
€ 
o 
Zz 
= 
e) 
oc 





1939-1945 1945-2000 


OVERALL 





54 


D. “LEADA” 
LEADERSHIP ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOTATOPTOLATOPTOLALoOPTOLAD | 
AUS | 6] 4] OF BS] 4 OF 3s} oO} oOFol;ot}ototlol ol 
BR | 28 | 8 | tT eal Soe i Oe Rod (ROSROs om 
cs TF OT ST oOFoTSs | ofFo}|ototolto}ototo|o| 
ENG [275] o7275]ofot;oftofoto}toyot oto. 
FR 14] 22] o Piof aif of 4{i1{ofololofotot|o | 


1 


—s —100 
ol —s — 
©| oo KK 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


COUNTRY] O | AT 0. 
0.00] 
| 0.00] 


BR | 0.06 
; | 0.00] 
| 0.00fna__[na__{na_Jra__|na_|na__Jna_na 


[o) 
& 
oO 


oO 
wo 
foe) 


o|O 
NLO] 
o|O 
POPP 


oO 
nN 
Oo 
(oe) 
PO 


0.03 


LEADERSHIP ADVANTAGE ATTACKER LOSES 





OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 
RowNames| O [TA|TDODPO;TATOPOLTATOPOTALoOTOLA dD | 


AUS | 6 | Of] OS] OT SS | OO] 1 PToO;|ot}oto| ol ol 
BR | 22] 1 8 ae 2 Oe a ee Oe Or oe eos ow 


| 0 
ENG [| tf of] 2yit of 2q ol 
FR of i3{ 1 {| 9 P7 {i {sq e | 
fo | 0 | 0 | 

| oO | Oo; of 0 | 

| 7 | 2 | i9f 8 | 
Ti Sem | 0 
| 1 
| 0 | 


1 


2 2 
| Of O| 0 | 
| Of 2] 0 | 
| of oto. 
| i fo] 0 | 

| 2 | 0 | 
| 0 | 


fo fo | 0 | 
re 
| i fo | 0 | 
| 2 feof 2 ti 


| Of 0 | 
| Of 0 | 
| o | 0 | 
Ee ion 
IS 3 | Of 3 | 
| Of 0 
| Of 0 | 
| Of 0 | 
| Of 0 | 
| 6 f 0 


RUSS 
USA 42 | 0 | 19 | 


1 
| 270] o0/] of 0 | 2 
Ps Totof 5 | 


OVERALL 1600-1914 1913-1939 


| 0.00] 0.31} 0.47| 0.00] ; | 0.00] 0.17] 
BR :; | 0.00} 0.00 
| 0.09} 0.04 
L_ 0.00] 43 


55 


=? 
ine) 





E. “TRNGA” 
TRAINING ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 


RowNames| O TATOPOLTATOPTOLTATOPTOPATD 
AUS | 8] 2] OF SF | 2] oF sto; ofol of oO 
BR 2ty 7 oP st eT opto} i Pope fotsa | 
cs dT UT OT OF SH} oOtTofotoftofot} oo. 
ENG [| 4] 3/074] 3{ofFo}tofofot oto. 
f3}o}2{7o]o] o| 


[6 | 2] 0 
Is OP e9f OF OF Of OF OF 0 | 
Oo 54] 5 | 3 p42] 3 2 8 | 1 | 
PR sf 10] 1 | OF to] 1 | OF OF 0 | 
RUSS | 2] oo; ofototogye2 | o | 
SOV. 47] 6 | 5 Oo] of of 4] 0 | 
pi2] 8st ot7 {oo 


OVERALL 1600-1914 1913-1939 1939-1945 


POJATOPTOTATOTOTAToToOT A 
| 0.40] 0.00} 0.23] 0.00] 
BR | 0.00) 

| 0.16] 0.00,N@RMH 0.29] 0.00) 0.05 


= 
° 
a 


1945-2000 
POTATO | 
| oO | 0 | 0 | 
| Oo | o | 0 | 
| o | 0 | 0 | 
PO} oO} 0 | 
fo | 0 | 0 | 








| 0.00 ROG} 0.00) na_|na_[na_| 


TRAINING ADVANTAGE ATTACKER LOSES 

OVERALL 1600-1913 1913-1939 1939-1945 
RowNames| OPAL OPOPALPOTOPAPoOPOPALD 
AUS | 11] Of 1 7] o;1P4]o}ofoto{o| 
BR 24; 4] 37] 4]ofis}otot4{o]s3| 


OVERALL 


COUNTRY] OT AT DT OTA] 
0.00] 0.06] 0.00) 0.00 
BR 0.00) 
0.00] 0.33 
ro.00fna_|na [na _| 
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1945-2000 
POT ALD 
| O | 0} 0 | 


=? 
ine) 





F. “MORALA” 
MORAL ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| O | ATOPOTATOPTOLATOPTOLALoOPTOLAD | 
AUS | 10] Of] OF 7] Oost} o};oftotot}otzot|o|o| 
BR (327; 56 Pov et i tor7 {4 opiw{ol;ofololol 
css A Tt To at i fT ofo}ototol;o}otolo)o | 
ENG [| 7{o;ot7{oftoyototofototofyototo. 
FR | 26] 9 | ft | Pit 4}ofoj;otofolo]o| 
GER | 35 | 7 | OF 8 | al 
IS 19 | 10 | 0 | 0 | 


| Oo | of ol] o | 
| oO | of ol] o | 
}i9{[ 17 0] 0 | 
Li4t opto] o| 


PR | 11 | 

RUSS | 2 | 0 | 0 | 

| O_ 
USA. 60 | 50] 0 


aw se 
— 


OVERALL 1600-1914 1939-1945 1945-2000 


ae zeae PO} AT OD 
| 0.45] 0.00RM@RMB} 0.25] 0.00f 0.00) | 0.24{ 0.00fna__[na__|na_| 
0.14] 0.00}mgmaa 0.06] 0.00] 
| 0.13] 0.0OR@B] 0.00] 0.00) 
| 0.34] 0.00) Ina__[na__[na__fna_|na_|na_ (DIB) 0.34] 0.00 





MORAL ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


SOMES ce a POPTALOPOPALTOPOPTAToOPOPATD 
A A tt to |} 4] o}ofototototo|o 


2 





ine) 4 4 


| 0 

on 

| 0 | 

| 2 | | 0 | 

USA 47] 14] 0 | 0 


OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


COUNTRY ee POTATOTOLA 
SA | 0.23] 0.00 | 0.14] 0.00 


| 0.06 0.00} 0.00] 
BRB. 0.03| 0.00FNGNSS] 0.07] 0.00 0.901 0.00 Pact 6.0 Ina__|na__|na_| 
0.00] 0.00fm@i@al 0.00] 0.08| 4fna_|na_|na_| 
na_jna__ na ae ne 
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G. “LOGSA” 
LOGISTICS ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOLATOPTOPLTATOPTOPLALoOPTOPA Ld | 
AUS | OT it PT OT eT tT Tot stTot;ofot;otofoy|o| o | 
BRO {st} 4 eae ToT of to} ott | Oo | o | 0 | 


| 0.05| 0.00 ROG] 0.00] 0.00 
| 0.09] 0.05 MOG} 0.00] 0.00RNGR8H] 0.00] 0.09} 
| 0.09] 0.06 RN0895) 0.05] 0.00) 
[0.00] 0.00 [0.00] 0.00 





LOGISTICS ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 _ 1939-1945 _1945-2000 
RowNames| OT ATDPEOPTATDOPOPATOPOPALoOPO PAD | 
AUS | 11] Of 1 T8{[oftots{otifototofo|o|o| 


| 


le 


— 


wo 
Ea 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


OUNTRYP OT ATOPOTATOPTOLATOPToOTAToOToOTAd | 
| 0.00] 0.00 ROG} 0.00] 0.00) 
BRFSS} 0.06] 0.09} | 0.00] 0.15 


| 0.00} 0.00RN@R82] 0.00] 0.08] 





H. “MOMNTA” 
MOMENTUM ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


REUNEGES Mee — —— POTALTOTOLTATD | 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


| 0.33] 0.00} 30 | 0.16] 0.00 RM] 0.42] 0.00fna__[na__[na_| 
| 0.20] 0.00 ; | 0.45] 0.00RNORSS] 0.12] 0.00fna__|na__[na_| 
| 0.36] 0.00 ; | 0.21} 0.00 | 0.00fna__|na_[na_| 
[0.45] 0.00 [0.45] 0.00 





MOMENTUM ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 
RowNames| OPAL DP OPALPOLTOPALPOPOPALPOPOPALD 
AUS |] 12]; Oo] OF 8B} O| OF 4] OT oTo};of;ofototl do 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


| 0.06] 0.00 ROB} 0.00] 0.00) 
| 0.15] 0.00 FGI] 0.07] 0.00FNORGS] 0.31] 0.00 


| 0.24) 0.00RMROG] 0.00] 0.00RNONGM 0.33] 0.00) ; 





I. “INTELA” 
INTELLIGENCE ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOPLATOPTOPLATOPTOPLALToOTOPA LD | 
AUS | BT AT OT Bat 4 tT oF Bt] of ofot;ol;ofol oo | 
BR [307 ST 2Petiltegreslte2yvops| 2yofololoal 
cS__| 3 {2 { 0) 3] 2{;ofotoftofotolo 


| 13 | 0 | 
| 0 


OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


PA} D 
| 0.00) | 0.00] 0.02} 
| 0.18] 0.00 fNORSS] 0.12] 0.00) 
| 0.21| 0.00RNORBAT 0.36] 0.00] 0.40] 0.40] 0.20) 


|_0.07} 0.00} 


| 0 | 
| 0 
| 0 
| 0 
RoW 
| 0 
| 0 





i) 





INTELLIGENCE ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 
RowNames| OP AT OPO PAPO LTOPAPO POLAT OP OPALD 
AUS | BL 7] SPOT TT tPA} ot; e2to} of} ofo}l ol ol 


— 1914 —— 1939 —— 1945 — 2000 


DOERE roca a0 TO ED 7 aaa 
OLCONORES [0.00] 0.29)mia@0] 0.00] 0.000886] 0.00[ 0.14na [na _|na_| 
| 0.00] 0.33 0.13fna_|na_|na_| 
[0.00] 0.14 [0.00] 0.14 
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J. “TECHNA” 
TECHNOLOGY ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOLTATOPTOLATOPTOLALoOTOLAD | 
AUS | 6 |] 4A] OF BS] AT OF S| oO} otol;ot}ototlo| ol 
BR [247 ist of 7] 27yoTs ite tvopa| stofolofol 
cs__| 3 {2 { 0) P33] 2{;ofototofo}oftofol oo. 
ENG [| 5] 2; o7s5]2]ofot;oftofoto}toyot oto. 
FR os pat} io} of 4] 1}ofoto}ofo]o]o| 


GER 22 | 20] 0 | a ea 
IS 16 | 13] OF 0 | | oO | 0 | 
OO 36 | 26 | 0 
PR =f 8B] 3] OF 8 | | oO | 0 | 
RUSS {| 0 | 2 | 0 f 0 | | oO | 2 | 
SOV. | 16 | 6 | Of 0 | | Oo | 1 | 
USA 88 | 22] of 16] 4] 0 


OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


| 0.20] 0.00FN@R8O] 0.20] 0.00 

| 0.34) 0.00RNORMS] 0.25] 0.00) 

| 0.43] 0.00RNORM8] 0.21] 0.00) 
[0.45] 0.00 


= 
oOo 








TECHNOLOGY ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 
RowNames| OP AT DP OPALPOLTOPALPOPOPALPOPTOPALD 
AUS] 12] Oo] OF 8B} O| OF 4] OL oTo};of;ofotol da 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 





| 0.00] 
[0.00[-0.00fna__Jna [na _| Ina__[na_|na (NO 0.00] 0.00 
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“INITA” 
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INITIATIVE ADVANTAGE ATTACKER WINS 
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APPENDIX B. BOXPLOTS OF OBJECTIVE VARIABLES 


This section has the boxplots that are not listed for the objective variables 
analyzed in the second chapter. The reader may refer to p.15 for more explanation on the 


boxplots. 


A. FORCE RATIO 
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Force Ratio During WW1 
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c. AIR FORCE RATIO 
Airforce Ratio During WW2 
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APPENDIX C. ACRONYMS 


COUNTRY NAMES 
AUS: Austria 


ENG: England 

BR: Britain 

GER: Germany 

PRUSS: _ Prussia 

IS: Israel 

USA: United States of America 
SOV: USSR 


RUSS: Russia 


CS: Confederate States (Present only in the battles of American Civil War) 
TU: Turkey 

EG: Egypt 

SYR: Syria 
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